Feb 10-11, 2016
Feb 10-11, 2016
R is a versatile, open source programming/scripting language that's useful for statistics but also any data-oriented tasks (including plotting).
It will make you more efficient and your work more repeatable and reliable
From highest to lowest precedence:
(, )^/*+-1 == 1 # equality (note two equals signs, read as "is equal to")
1 != 2 # inequality (read as "is not equal to")
1 < 2 # less than
1 <= 1 # less than or equal to
1 > 0 # greater than
1 >= -9 # greater than or equal to
Most of R's power and flexibility comes from functions.
A function is a saved object that takes inputs to perform a task.
Functions take in information and return outputs.
A function takes zero, one, or many arguments (also called parameters), depending on the function, and returns a value.
To call a function, type its name followed by brackets (). Arguments go inside the brackets and are separated by commas.
name_of_function(arg1,arg2,arg3)
project ├── R │ └── functions.R ├── data │ ├── data1.csv │ └── data2.csv ├── doc │ └── manuscript.doc ├── out │ └── summaries.csv ├── plots │ ├── plot1.png │ └── plot2.png ├── 01-load_clean_data.R ├── 02-analysis.R ├── 03-plotting.R └── my_project.Rproj
"There are only two hard things in Computer Science: cache invalidation and naming things." – Phil Karlton
x is easy to type, but may not mean much.my.var) or an underscore (my_var), or use camelCase.mean, sum, log)"a", "swc"2, 15.52L (the L tells R to store this as an integer)TRUE, FALSE1+4i (complex numbers with real and imaginary parts)Grammer of Graphics:
Access a single column:
mydf$col_name
Subset rows and columns:
my_df[rows, columns]
rows and columns can be integers, character, or logical
dplyr implements the following verbs useful for data manipulation:
group_by(): set the grouping variable(s)filter(): focus on a subset of rowsselect(): focus on a subset of variablesmutate(): add new columns (also mutate_each())summarize(): reduce each group to a single row of summary statisticsarrange(): re-order the rowsselect()group_by()summarize()